US-SONIVIS-MC1

SONIVIS
University of Stuttgart

VAST 2009 Challenge
Challenge 1: - Badge and Network Traffic

Authors and Affiliations:

Claudia Mueller, University of Stuttgart, cmueller@sonivis.org [PRIMARY contact]
Lukas Birn, Capgemini sd&m AG, lukas.birn@capgemini-sdm.com

Tool:

The open source programming language Processing [1] is applied to create the visual model. Processing combines software concepts with principles of visual form and interaction. It is a text programming language developed to generate and modify images. Eclipse [2] is utilized as a convenient programming environment. It is planned to integrate this visual model with the open source software SONIVIS [3].

The implemented visualization is designed as a monitoring instrument. It allows an interactive exploration of data which is displayed in one view. The data provided is directly imported.

The employees are placed on the vertical axis in descending order and the timeline is arranged on the horizontal axis. A green fading bar visualizes the entry of employees into the embassy. A red bar depicts the attendance of an employee in the restricted area. White bars visualize traffic on an employee’s computer. The request and response payloads are displayed as sparklines. Positive amplitude of white bars indicates the request payload and negative amplitude, the response payload. We defined three rule violations. The first rule violation “piggybacking into/out of restricted area” is displayed by a yellow corona; the second rule violation “unattended network access” is represented by an orange corona; and the third rule violation “access to suspicious IP” is shown by a pink corona. Semitransparent green boxes highlight the existing alibis of users. Based on the defined rule violations and the alibis, we detect one employee as the suspect. A semitransparent white box highlights this employee.

References

[1] http://processing.org/

[2] http://eclipse.org

[3] http://www.sonivis.org

Video:

You will find a short video presentation of the tool here.

ANSWERS:

MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent.

Traffic.txt

MC1.2: Characterize the patterns of behavior of suspicious computer use.

Our analytical process consists of six phases: acquire, parse, represent, reasoning, refine, and interact (cp. Figure 1). In the following sections, we briefly explain the first three phases which are carried out by the software solution and focus on the last four steps.

Figure 1: Analytical reasoning process, high resolution here available.

Phases: Acquire and Parse

All available data from the Mini Challenge 1-Badge and Network Traffic is downloaded. The developed software parses all data sets (proximity card logs and network traffic logs). The description of the software can be found in the section “Description of the technical solution”.

Phase: Represent

Firstly, we define requirements on the visual model. One view should contain all necessary information. The visualization should allow an interactive exploration of data and it should be reusable as a monitoring instrument for similar questions.

Secondly, an appropriate visualization library is needed to present the available data. We decided to use the open source programming language Processing because of its low entry level and the possible integration with the visual analytics software SONIVIS.

Thirdly, the basic visual model is specified. The time-dependent employee_s activity data should be as simple as possible and all information should be quickly grasped.

Finally, the initial visual model (v0.5) is programmed, containing only the proximity data logs. Employees are organized on the vertical axis in descending order. The timeline is arranged on the horizontal axis. There are three types of events: prox-in-building, prox-in-classified, and prox-out-classified. When an employee enters the embassy, a green fading bar is used to show the “fuzziness” of this event, and a red bar depicts the attendance of an employee in the restricted area. If an employee leaves the restricted area and there is no “prox-in-classified” event, or he enters without a previous departure, then a rule violation exists. This rule violation “piggybacking into/out of restricted area” is displayed by a yellow corona (cp. Figure 2).

Figure 2: Visual Model (v0.7), high resolution here available.

Phase: Reasoning I

The visual model reveals three employees, nos. 30, 38 and 49, who go against policy and piggyback (enter or leave the restricted area without badging in or out by following a co-worker who did badge in or out).

However, the available information is not sufficient to identify the suspicious person. The network traffic logs are therefore integrated in the visual model.

Phase: Refine I

The initial visual model is enhanced to reveal the target person. The IP traffic data contains the sizes of the request and the response in bytes, the port, the source IP address and the destination IP address. A white bar visualizes traffic on an employee’s computer. The request and response payload are displayed similar to sparklines. Positive amplitude of the white bar indicates the request, and negative amplitude, the response payload.

Phase: Reasoning II

This version of the visual model (v0.8) shows the employees’ arrival at the embassy, their attendance in the restricted area and the activity of their computer, including the request and response size.

For the following analytical process, two assumptions are defined. Firstly, we presume that only one person from the embassy is suspected of sending data to an outside criminal organization. Secondly, we deduce that each employee has access to each computer in the embassy and each computer is assigned to only one employee and should only be used only by this employee. For example, only the employee no. 10 utilizes the computer with the IP 37.170.100.10. Therefore, we defined our first hypothesis: “There should be no traffic on a personal computer during the absence of the defined user.”

Phase: Refine II

We apply this hypothesis to define our second rule violation “unattended network access”. It is represented by an orange corona in the visual model (cp. Figure 3).

Figure 3: Visual Model (v0.8), high resolution here available.

Phase: Reasoning III

After applying this rule violation on our data set, we reveal eight events of activity while its defined user is in the restricted area. The timelines of employees nos. 15, 16, 31, 41, 52 and 56 show this rule violation.

Based on our findings, we check the target IP address to which the data was sent. During unintended network access this target address is always used. It is the IP address 100.59.151.133. We define our second hypothesis: “The revealed IP address belongs to the criminal organization and every data transfer to this address is unauthorized.”

Phase: Refine III

We define our third rule violation “access to suspicious IP”. It is represented by a pink corona. A new version of our visual model is implemented (v0.9) showing all the defined rule violations (cp. Figure 4).

Figure 4: Visual Model (v0.9), high resolution here available.

Phase: Reasoning IV

Activities on computers of eight further employees apply to this rule violation. These are employees nos. 8, 10, 13, 16, 18, 20, 31 and 32. The defined rule violations disclose 13 employees with unusual behavior. We therefore regard them as suspects.

At the present time, we are not able to narrow the number down to one person. A further adaption of our visual model is necessary. Therefore, we define the third hypothesis: “Staying in the restricted area or using the defined user’s computer can be seen as an alibi which excludes these employees from being the suspect.”

Phase: Refine IV

We define two restrictions on the visualization: firstly, remove all users staying in the restricted area while an unauthorized data transfer happens, and secondly, remove all users using their defined computer at the very moment the unauthorized data transfer happens. For the last restriction, a time lag of two seconds is permitted between the unauthorized data transfer and the usage of the defined user’s computer. Consequently, an employee is not the target person if he used his computer within two seconds before and after the unauthorized data transfer.

The final visual model contains all employees where the second and third rule violation applies. Semitransparent green boxes highlight the existing alibis of users staying in the restricted area while an unauthorized data transfer takes place (cp. Figure 5).

Phase: Reasoning V

The defined restrictions reduce the number of displayed employees and their visualized activities. Now the target person can be identified very easily. Based on the defined rule violations and restrictions, we detect employee no. 48 as the suspect. A semitransparent white box highlights this employee.

Figure 5: Visual Model (v1.0), high resolution here available.

Phase: Interact

Methods are added to the final visual model to manipulate all available data more conveniently and to allow the user to control the visualized employee list. During the analytical reasoning process, we realized that first of all an overview of the data is needed. The user should then be able to zoom and filter the data and details on demand. We therefore added further functions to the visualization, which are shown in the video contribution of our submission.

SONIVIS University of Stuttgart

VAST 2009 Challenge Challenge 1: - Badge and Network Traffic

Authors and Affiliations:

Tool:

SONIVIS
University of Stuttgart

VAST 2009 Challenge
Challenge 1: - Badge and Network Traffic